update rhq_system_config set property_value='2.115', default_property_value='2.115' where property_key='DB_SCHEMA_VERSION';
This is the RHQ 4.4 release. It was released on May 9th, 2012
If upgrading from RHQ 4.2 you must first make a manual change to your database. Apply this change only if upgrading from RHQ 4.2, not earlier versions. Execute the following SQL to update the schema version from 2.114 to 2.115:
update rhq_system_config set property_value='2.115', default_property_value='2.115' where property_key='DB_SCHEMA_VERSION';
After this update proceed with the upgrade normally.
The autodiscovery portlet is no longer included on the default global dashboard. Go to the Inventory->Discovery Queue to visit the discovery queue. Or, you can optionally edit the global dashboard and add the portlet.
Some browsers (most likely Webkit based like Safari and Chrome) will not automatically forward you from the installer to the login page. Manually switch to
http://localhost:7080/coregui/
RHQ 4.4 requires Java 6 – it may work on Java7, but has not yet been extensively tested (feedback is welcome).
There have been several changes made to enhance availability collection, reporting and alerting. For more on all of the changes to availability see: Availability Improvements.
In addition to DOWN and UP availability types RHQ now has UNKNOWN and DISABLED as well.
The UNKNOWN availability allows RHQ to better represent resources for whom we don't know the current availability. The best example of this is when an agent is down. It's managed resources may be up or down, we don't know.
RHQ now allows users to mark resources DISABLED. This is an availability type primarily assigned by users, not by agent reporting. DISABLED resources will ignore availability reported by the agent. This is useful for planned outages or resources that are expected to be, or are somehow set administratively down. Since DISABLED resources are not DOWN, they are omitted from dashboard portlets and availability alerting scenarios.
When an agent is down its platform resource will be marked DOWN but all of the platform children will now be marked UNKNOWN to represent that the RHQ server is not getting updated. In the past the children were also marked as DOWN. Note that DISABLED children will be left as DISABLED (see more below).
Existing Goes DOWN alert conditions will not fire when a resource is set to UNKNOWN. So, the new availability assignment for down agents can affect existing alerting. The intent is to be more accurate and avoid false positives but if the prior behavior is desired the alert conditions should be updated to Goes NOT UP, which is a new option.
The introduction of new availability types forced changes to the way group availability is determined. Group availability is now determined with the following algorithm, evaluated top to bottom in the table below:
Member Availability |
Group Availability |
Empty Group |
Grey / EMPTY |
All Down |
Red / DOWN |
Some Down or Unknown |
Yellow / WARN |
Some Disabled |
Orange / DISABLED |
All UP |
Green / UP |
The remote API method ResourceManagerRemote.getLiveResourceAvailability() no longer returns null for unknown, it now properly returns AvailabilityType.UNKNOWN. This may affect existing remote clients or CLI scripts.
Note that 'Goes DOWN' alert conditions remain unchanged and are unaffected by the upgrade. And are satisfied as before, when the resource's availability changes from NOT DOWN to DOWN. But note that resources moving to UNKNOWN or DISABLED will not meet the condition. There is now a 'Goes NOT UP' operator that will match when the state moves from UP to any other availability type.
In addition to Availability Change alert conditions, it is now possible to create Availability Duration conditions. 'Stays DOWN for Xm' will match if a resource goes from UP to DOWN and stays down for X minutes. 'Stays NOT UP' is similar, but affects changes from UP to any other availability.
This is a major change. Previously, all resources were checked on every availability scan. By default every five minutes. This could caused 'peak and valley' issues with CPU and/or memory spikes. It also did not provide any way to favor checking of critical resources and lessen priority for many non-critical, service-level resources. With the changes:
Provide resource-level granularity for collecting avail information.
Every non-platform resource type will have a built-in metric called "AvailabilityType"
The value is in seconds
The new Availability metric schedule will be added automatically to all types in updated plugins. So, for upgrades, new versions (updated MD5) of current plugins must be deployed. Custom plugins must be rebuilt and redeployed to get the new metric schedule.
Previously an availability check was performed on all resources with a 5 minute interval, and all resources were checked in one pass. Now, availability checking is performed based on the Availability metric schedule. If not set in the plugin descriptor the resource type's default availability check interval is based on its category:
Server 60s (1 minute)
Service 600s (10 minutes)
Platform not applicable, platform availability is determined by agent activity, not getAvailability() calls.
This means that Availability collection intervals can be set, like other metric schedules, at the Template, Group and Resource levels. And can be changed at the user's discretion. If the metric is disabled then affected resources will defer to their parent's availability type.
The Avail prompt command generated either a changes-only or full report, and that is still true. But it always performed an avail check on every resource. With the introduction of prioritized availability checking that is not true, the avail check will be performed only if there is no current availability for the resource, or it's scheduled time is past. There is a new option, --force that can be specified to force the availability checks. Note that this option will increase execution time.
For best performance it is recommended that the collection interval for non-interesting resources be set to a large interval, or be disabled.
Availability checking now happens incrementally. The availability job runs at 30 second intervals and not every resource is checked on each pass. Instead, checking is spread out, still respecting the desired intervals as much as possible, but in a fashion that avoids the 'peak and valley' issues of the past.
Back-filling of an agent's platform resources was performed after a 15 minute period of no communication from the agent. This period is set as the AGENT_MAX_QUIET_TIME_ALLOWED system setting. This was true of an agent shut down gracefully or one that went down unexpectedly. The upgrade will now set this value to 5 minutes, which is being reduced due to architectural improvements. Also, agents shut down gracefully will be back-filled immediately.
Operations now have the ability to request an immediate availability check after completion. All of the RHQ plugins have been updated for any Start/Stop/Restart operations. So, availability should typically be updated within 60s of the operation completing and can be reflected in the UI if it is refreshed.
The REST api has been enhanced. This API is included to get the effort started to build a REST interface into RHQ so that the server is better accessible from other tools and languages.
This API IS NOT STABLE. Do not rely on it. IT WILL CHANGE
To access the API, go to http://localhost:7080/rest/ See also Design-REST and this blog post
RESTEasy has been updated to version 2.3.2.Final
Updated Japanese translations by Fusaykui Minamoto
Initial Russian translations by Denis Krusko
The reports under /coregui/#Reports can now be exported in CSV format.
Several of the reports offer filtering capabilities to generate a fixed data set.
[bug 805987] The platform plugin now reports metrics for the actual free and actual used system memory.
The JBoss AS 7 plugin has been renamed. If you install RHQ 4.4 into an existing RHQ 4.3 database and had the as7 plugin installed before, you should remove it before the upgrade.
There is now a project RHQ samples on GitHub available that lists additional sample code that works together with RHQ. This also contains examples in other programming languages than Java to access the REST api.
The embedded agent may fail to find the server / register with the server - this means that it will not be able
to discover and manage any resources. Please use an external agent. BZ 819766
For the group display, it may look if resource counts are wrong when you have resources sitting in the autodiscovery queue BZ 819897
The GWT part of the UI has partially been translated into German, Portuguese, Japanese, Chinese and Russian. The language should be automatically selected depending on your browser settings. You can explicitly access other translations by appending a locale specifier in the URL. For example to select the German translation you would append ?locale=de to the base URL, e.g. http://localhost:7080/coregui/?locale=de.
Supported locales are:
zh for Chinese
de for German
ja for Japanese
pt for Portuguese
ru for Russian
Please ping us if you want to help translating the UI to your language. Translations are done via the translations project on GitHub, which also has some instructions on how to start.
Please report all bugs you find in Bugzilla. If you find a bug that has been recorded in the above list, please leave a comment on them especially if this needs special steps to reproduce.
Please consult Bugzilla with a target release of RHQ 4.4.0 for a list of resolved issues
You can download the release here.